menu
  Home  ==>  papers  ==>  colibri_utilities  ==>  cooking_the_code   

Cooking The Code - Felix John COLIBRI.


1 - Why Source Code Filtering ?

For most of our customers, we provide the source code of our contracted projects. For many of our customer applications, we use a developer version that includes parts which are of no interest the customer :
  • trial code
  • developper comments (todo list, add this, contact such and such for review, who corrected this little bit, see also some other project)
  • debug logging which could be more voluminous than usage logging
Before delivering the source, we could manually remove the unwanted parts, but, as with other "dual code" endeavours, this process is error prone and tedious:
  • we could miss some parts
  • we could remove too many lines
  • maintaining two versions of the code over time can become a nightmare
Therefore we use a small utility which removes the unwanted parts. This process is performed using simple innocuous comment markers and a companion filtering parser.




2 - The Delphi Code Cooker

2.1 - The Objective and the Grammar

The easiest way is to present a "before / after" example. Here is a small piece of code :



Let's assume that we want to remove

  • isolated rows or groups of rows
  • comments
To do so, we add markers, with the following effect (on the left the original, on the right, the filtered text):
  • to remove a row, we append "//-" at the end of the row

     
    unit _cooker_test;            |  unit _cooker_test; 
      interface                   |    interface 
      implementation              |    implementation 
        BBB                       |      BBB 
        aaa //-                   | 
        CCC                       |      CCC 
    

  • to remove row groups
    • "//>" and "//<"
      • either surrounding those lines

         
        CCC                       |      CCC 
        //>                       | 
        bbb                       | 
        ccc                       | 
        //<                       | 
        DDD                       |      DDD 
        

      • or at the end of the first and the last line

         
        DDD                       |      DDD 
        ddd //>                   | 
        eee                       | 
        fff //<                   | 
        EEE                       |      EEE 
        

  • for the other types of comments
    • first the "//" comments:
      • the special "// --- " is used to keep a comment

         
        EEE                       |      EEE 
        // --- FFF                |      // --- FFF 
        GGG                       |      GGG 
        

      • all other start of line "//" comments will be removed

         
        GGG                       |      GGG 
        //                        | 
        // ggg                    | 
        // --  hhh                | 
        HHH                       |      HHH 
        

      • the end of line // comments are removed, except those after BEGIN and END;

         
        HHH                       |      HHH 
        III // --  jjj            |      III 
        BEGIN //  JJJ             |      BEGIN // JJJ 
        END; //  KKK              |      END; // KKK 
        LLL                       |      LLL 
        

    • for the the "(*" comments
      • the comments starting with (* on a line by itself will be removed

         
        JJJ                       |      JJJ 
        (*                        | 
        jjj                       | 
        kkk *)                    | 
        (*                        | 
        ppp *) QQQ                |      QQQ 
        

      • other "(*" comments will remain

         
        (* KKK                    |      (* KKK 
        LLL *)                    |      LLL*) 
        (*$R+*)                   |      (*$R+*) 
        MMM (* NNN                |      MMM (* NNN 
        OOO *) PPP                |      OOO *) PPP 
        

    • the rules are the same for "{"
      • the comments starting with { on a line by itself will be removed

         
        JJJ                       |      JJJ 
        {                         | 
        jjj                       | 
        kkk }                     | 
        {                         | 
        ppp } QQQ                 |      QQQ 
        

      • other "{" comments will remain

         
        { KKK                     |      { KKK 
        LLL }                     |      LLL} 
        {$R+}                     |      {$R+} 
        MMM { NNN                 |      MMM { NNN 
        OOO } PPP                 |      OOO } PPP 
        

Please note that
  • the rules are somehow dependent on our coding habits. For instance we use
    • "//" comments for explanations, and with two dashes, "// -- ", to hilite them from code

          // -- algorithm
          x:= 5;

    • end of line "//" comments are to exclude previous values:

          If line> 150 // 10
            Then

    • "(*" comments for contiguous bloc elimination (from the compilation). Those blocs can contain "//" comments. In addition, when we exclude some bloc from compilation, we place the "(*" and "*)" at the margin, on a separate line :

          x:= 5;
      (*
          // -- this is an explanation comment
          y:= 8;
      *)

          a:= b;

    • "{" for non contiguous bloc elimination. So "{" comments can enclose several "(*" blocs

          a:= b;
      {
          c:= d;
      (*
          e:= f;
      *)
          g:= h;
      (*
          i:= j;
      *)
      }

          k:= l;


  • this explains why we chose
    • the "//>" "//<" "//-" "// --- " as code cooking markers, since we never use those in our usual code
    • the "(*" and "{" on single lines to remove blocs. If we decide to keep in the cooked code some commented out bloc, we simply add any character after the "(*" or "{" (like "(*+" for instance)
  • for commented out code, we use "(*" on a single line, and also the matching "*)" on a line by itself. But to stay coherent with the compiler, we accept that the matching "*)" be placed anywhere in a line

        x:= 5;
    (*
        y:= 8;
    *)

        a:= b;
    (*
        d:= 9; *)
     e:= 18
        f:= 15;




2.2 - The Delphi Code

Basically we use a tStrings to analyze the text, looking for the different markers using
  • Copy for start of line extraction
  • Pos for the other lookups


2.2.1 - The Class definition

The worker Class is very simple

Type c_remove_marked_up_text=
         Class(c_basic_object)
           m_c_original_listtStringList;
           m_c_result_listtStringList;

           Constructor create_remove_marked_up_text(p_nameString);
           Procedure remove_marked_up_text;
           Destructor DestroyOverride;
         End// c_remove_marked_up_text



2.2.2 - The main filtering loop

The main loop is also very simple
  • we read each line
  • we call Functions which test one of the marker rule, and
    • decides how to transform the line (mainly only a keep or throw away choice)
    • returns True if the line has been handled in this Function


Here is the main loop:

Procedure c_remove_marked_up_text.remove_marked_up_text;
  Var l_list_indexInteger;
      l_the_linel_trimmed_lineString;

      Function f_end_of_fileBoolean;
        //  -- ooo

      Procedure read_line;
        // -- ooo

      Procedure add_result_line;
        // -- ooo

      Function f_remove_start_of_line_slash_commentsBoolean;
        // -- ooo

  Begin // remove_marked_up_text
    m_c_result_list.Clear;

    l_list_index:= 0;
    read_line;

    While Not f_end_of_file Do
    Begin
      If f_remove_start_of_line_slash_comments Then Else
      If f_remove_parenthesis_star_comment Or f_remove_brace_comment Then Else
      If f_remove_ds_ending_slash_comments Then Else
      If f_remove_ds_endig_slash_minus_comments Then Else
      If f_remove_ds_middle_line_slash_comments Then Else
          Begin
            add_result_line;
            read_line;
          End;
    End// while l_the_line

    If l_the_line<> ''
      Then add_result_line;
  End// remove_marked_up_text



2.2.3 - Example of start of line marker

We handle the "//" at the start of the line in the following way:

Function f_remove_start_of_line_slash_commentsBoolean;
    // -- True if has a "__//xxxxxx" line

  Procedure erase_slash_greater_bloc;
      // -- a "__//>  ... __//< bloc
    Begin
      Repeat
        read_line;
      Until f_end_of_file Or (Copy(l_the_linel_index, 3)= '//<');
      read_line;
    End// erase_slash_greater_bloc

  Begin // f_remove_start_of_line_slash_comments
    If Copy(l_the_linel_index, 2)= '//'
      Then Begin
          Result:= True;
          // -- keep if "// --- "
          If Copy(l_the_linel_index+ 2, 5)= ' --- '
            Then Begin
                add_result_line;
                read_line;
              End
            Else
              If Copy(l_the_linel_index, 3)= '//>'
                Then erase_slash_greater_bloc
                Else read_line;
        End
      Else Result:= False;
  End// f_remove_start_of_line_slash_comments



2.2.4 - Example of middle of line marker

We remove the middle "//" with code like this:

Function f_remove_ds_middle_line_slash_commentsBoolean;
    // -- "xxx // yyy"
    // -- CAUTION: no // within a string
  Var l_slash_slash_positionInteger;
  Begin
    l_slash_slash_position:= Pos('//'l_the_line);
    If (l_slash_slash_position> 1)
        And Not
          (
              (Pos('end; // 'LowerCase(l_the_line))> 0)
            Or
              (Pos('begin // 'LowerCase(l_the_line))> 0)
           )
      Then Begin
          Result:= True;
          Delete(l_the_linel_slash_slash_position,
              Length(l_the_line)+ 1- l_slash_slash_position);
          add_result_line;
          read_line;
        End
      Else Result:= False;
  End// f_remove_ds_middle_line_slash_comments

Please note that

  • this code works since we removed the start of line "//" and ending "//-", "//>", "//<" before (see the main loop). So our code is depending on the calling order of our filtering functions


2.2.5 - The main Form

The main form is quite standard. Here is a snapshot of this form:

cooking_the_code

where

  • the help is a memo saved in a .txt file (can be filled by the user)
  • the favorites are loaded from a .txt file (same function as a filter combo box, but with a fixed format instead of a drop down)
  • the source code to filter is defined by its path and file name
  • the destination is defined by its path (with same file name), and the purple edit can be used to create the sub folder
  • selecting a source .PAS file
    • loads the text in the "original_" memo (where it still can be modified and saved to disc)
    • computes the filtered result
    • this result is presented in the "result_" memo (where we still can modify and save it)



3 - Unit Test to Test the Code Cooker

To test that our filtering routines correctly remove the marked up code, we wrote a unit test Class.

The test Class definition is

Type c_remove_marker_test=
         Class(c_test_case)
           Private
             m_c_remove_sncfc_remove_marked_up_text;
           Protected
             Procedure SetupOverride;
             Procedure TeardownOverride;
           Published
             // -- all the tests
             Procedure test_slash_comment;
             Procedure test_slash_comment_enclosing;
             Procedure test_slash_eol_comment_enclosing;
             Procedure test_slash_keep_comment;
             Procedure test_slash_remove_comment;
             Procedure test_remove_eol_slash_comment;
             Procedure test_remove_parenthesis_star_comment;
             Procedure test_keep_parenthesis_star_comment;
             Procedure test_remove_brace_comment;
             Procedure test_keep_brace_comment;
         End// c_remove_marker_test

and, as an example, here is the test of the enclosing "//>" and "//<" markers:

Procedure c_remove_marker_test.test_slash_comment_enclosing;
  Begin
    With m_c_remove_sncfm_c_original_list Do
    Begin
      Add('    CCC');
      Add('    //>');
      Add('    bbb');
      Add('    ccc');
      Add('    //<');
      Add('    DDD');

      remove_marked_up_text;

      Check(m_c_result_list.Strings[0]= m_c_original_list[0],
          '"enclosing //> //<", removed line 0') ;
      Check(m_c_result_list.Strings[1]= m_c_original_list[5],
          '"enclosing //> //<", removed line 5') ;

      Check(m_c_result_list.CountCount- 4+ 1,
          '"enclosing //> //<", count') ;
    End// with m_c_remove_sncf
  End// test_slash_comment_enclosing

The result of the test looks like:

cooking_the_code_test




4 - Comments and Improvements

This tool suits our needs, because it neatly fits our coding conventions.

However, it can easily be improved. You might

  • change the markers (for instance, use "//%" instead of "//-")
  • use more conspicuous markers. We decided to make them as discreet as possible, since we do not want to notice them while developing the code
  • change the semantics (what you want to filter out or keep)
  • add additional filtering rules (or remove some of our own)
It could also be useful to add some checking proceudres (to check that the "//>" and "//<" count matches etc)

The filter is similar to weaving (we consider two aspects of the same code), but our process is unidirectional (there is no easy way to integrate customer changes in the original source).

There are some obvious drawbacks:

  • we are only removing areas. It would be quite tedious to remove variables or some procedure parameters
  • there is no check as to the consistency of the filtering (we could remove a class method definition, and not remove its implementation, or remove a Uses name, only to find out after compilation that some imported information is still used in the filtered code)
  • the technique is invasive (the original code is modified by inserting markers to perform the filtering)
In our working tool we also added the removal of procedure declaration, implementation and call. We hand over a list of procedure name, and the filter removes them everywhere. The nice thing is that this code is not invasive. It requires however a lexcical analyzer which could be avoided in the filter presented in this article.

Based on the success of this article, on time, and on popular demand ( :) ) we could (well, cook up, and then) publish this procedure filtering tool, with its scanner, parser and unit test code, in a companion article on this site.

As a last note, we used the term "cooking the code", as a nod to the very funny "cooking the book" expression.




5 - Download the Source Code

Here are the source code files: The .ZIP file(s) contain:
  • the main program (.DPR, .DOF, .RES), the main form (.PAS, .DFM), and any other auxiliary form
  • any .TXT for parameters, samples, test data
  • all units (.PAS) for units
Those .ZIP
  • are self-contained: you will not need any other product (unless expressly mentioned).
  • for Delphi 6 projects, can be used from any folder (the pathes are RELATIVE)
  • will not modify your PC in any way beyond the path where you placed the .ZIP (no registry changes, no path creation etc).
To use the .ZIP:
  • create or select any folder of your choice
  • unzip the downloaded file
  • using Delphi, compile and execute
To remove the .ZIP simply delete the folder.

The Pascal code uses the Alsacian notation, which prefixes identifier by program area: K_onstant, T_ype, G_lobal, L_ocal, P_arametre, F_unction, C_lass etc. This notation is presented in the Alsacian Notation paper.
The .ZIP file(s) contain:

  • the main program (.DPROJ, .DPR, .RES), the main form (.PAS, .ASPX), and any other auxiliary form or files
  • any .TXT for parameters, samples, test data
  • all units (.PAS .ASPX and other) for units
Those .ZIP
  • are self-contained: you will not need any other product (unless expressly mentioned).
  • will not modify your PC in any way beyond the path where you placed the .ZIP (no registry changes, no path outside from the container path creation etc).
To use the .ZIP:
  • create or select any folder of your choice.
  • unzip the downloaded file
  • using Delphi, compile and execute
To remove the .ZIP simply delete the folder.

The Pascal code uses the Alsacian notation, which prefixes identifier by program area: K_onstant, T_ype, G_lobal, L_ocal, P_arametre, F_unction, C_lass etc. This notation is presented in the Alsacian Notation paper.



As usual:

  • please tell us at fcolibri@felix-colibri.com if you found some errors, mistakes, bugs, broken links or had some problem downloading the file. Resulting corrections will be helpful for other readers
  • we welcome any comment, criticism, enhancement, other sources or reference suggestion. Just send an e-mail to fcolibri@felix-colibri.com.
  • or more simply, enter your (anonymous or with your e-mail if you want an answer) comments below and clic the "send" button
    Name :
    E-mail :
    Comments * :
     

  • and if you liked this article, talk about this site to your fellow developpers, add a link to your links page ou mention our articles in your blog or newsgroup posts when relevant. That's the way we operate: the more traffic and Google references we get, the more articles we will write.


6 - References

The presentation and source code of the Unit Test project is presented in the Unit Test Framework article




7 - The author

Felix John COLIBRI works at the Pascal Institute. Starting with Pascal in 1979, he then became involved with Object Oriented Programming, Delphi, Sql, Tcp/Ip, Html, UML. Currently, he is mainly active in the area of custom software development (new projects, maintenance, audits, BDE migration, Delphi Xe_n migrations, refactoring), Delphi Consulting and Delph training. His web site features tutorials, technical papers about programming with full downloadable source code, and the description and calendar of forthcoming Delphi, FireBird, Tcp/IP, Web Services, OOP  /  UML, Design Patterns, Unit Testing training sessions.
Created: jun-09. Last updated: jul-15 - 98 articles, 131 .ZIP sources, 1012 figures
Copyright © Felix J. Colibri   http://www.felix-colibri.com 2004 - 2015. All rigths reserved
Back:    Home  Papers  Training  Delphi developments  Links  Download
the Pascal Institute

Felix J COLIBRI

+ Home
  + articles_with_sources
    + database
    + web_internet_sockets
    + oop_components
    + uml_design_patterns
    + debug_and_test
    + graphic
    + controls
    + colibri_utilities
      – delphi_net_bdsproj
      – dccil_bat_generator
      – coliget_search_engine
      – dfm_parser
      – dfm_binary_to_text
      – component_to_code
      – exe_dll_pe_explorer
      – dll_process_viewer
      – the_alsacian_notation
      – html_help_viewer
      – cooking_the_code
      – events_record_playback
    + colibri_helpers
    + delphi
    + firemonkey
    + compilers
  + delphi_training
  + delphi_developments
  + sweet_home
  – download_zip_sources
  + links
Contacts
Site Map
– search :

RSS feed  
Blog